MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Introduction

Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.

For more detail please check our Paper

Installation

You can download our environmental setup at Environment Folder and use the following script.

conda env create -f environment.yml

Please be noted, that the above environment is specifically used to run MTI-Net.py. To generate and fine-tuned Self Supervised Learning (SSL) feature, please use python 3.6 and follow the instructions in following link to deploy fairseq module.

Fine-tuned SSL model and Extact SSL Feature

Please use the following code to fine-tuned SSL model:

python FT_SSL_Feat.py

To extract the SSL feature, please use the following code:

python Extract_FT_SSL.py

Train and Testing MTI-Net

Please use following script to train the model:

python MTI-Net.py --gpus <assigned GPU> --mode train

For, the testing stage, plase use the following script:

python MTI-Net.py --gpus <assigned GPU> --mode test

Citation

Please kindly cite our paper, if you find this code is useful.

Zezario, R.E., Fu, S.-w., Chen, F., Fuh, C.-S., Wang, H.-M., Tsao, Y. (2022) MTI-Net: A Multi-Target Speech Intelligibility Prediction Model. Proc. Interspeech 2022, 5463-5467

Note

Self Attention, SincNet, Self-Supervised Learning Model are created by others

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Environment		Environment
Lists		Lists
Pretrained_Model		Pretrained_Model
Extract_FT_SSL.py		Extract_FT_SSL.py
FT_SSL_Feat.py		FT_SSL_Feat.py
LICENSE		LICENSE
MTI-Net.py		MTI-Net.py
README.md		README.md
SincNet.py		SincNet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment

Environment

Lists

Lists

Pretrained_Model

Pretrained_Model

Extract_FT_SSL.py

Extract_FT_SSL.py

FT_SSL_Feat.py

FT_SSL_Feat.py

LICENSE

LICENSE

MTI-Net.py

MTI-Net.py

README.md

README.md

SincNet.py

SincNet.py

Repository files navigation

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Introduction

Installation

Fine-tuned SSL model and Extact SSL Feature

Train and Testing MTI-Net

Citation

Note

About

Releases

Packages

Languages

License

dhimasryan/MTI-Net

Folders and files

Latest commit

History

Repository files navigation

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Introduction

Installation

Fine-tuned SSL model and Extact SSL Feature

Train and Testing MTI-Net

Citation

Note

About

Resources

License

Stars

Watchers

Forks

Languages